Powerful knockoffs via minimizing reconstructability
نویسندگان
چکیده
Model-X knockoffs (J. R. Stat. Soc. Ser. B. Methodol. 80 (2018) 551–577) allows analysts to perform feature selection using almost any machine learning algorithm while provably controlling the expected proportion of false discoveries. This procedure involves constructing synthetic variables, called knockoffs, which effectively act as controls during selection. The gold standard for has been minimize mean absolute correlation (MAC) between features and their but, surprisingly, we prove this can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. key problem is that minimizing MAC creates joint dependencies allow algorithms reconstruct effect on response knockoffs. To improve power, propose generating reconstructability (MRC) features, demonstrate our proposal by showing it computationally efficient, robust, powerful. We also certain MRC a notion estimation error models. Through extensive simulations, show often dramatically outperform MAC-minimizing find no settings more than slight margin. implement methods many others from literature new python package knockpy.
منابع مشابه
Familywise Error Rate Control via Knockoffs
We present a novel method for controlling the k-familywise error rate (k-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Candès. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testin...
متن کاملFamilywise Error Rate Control via Knockoffs
We present a novel method for controlling the k-familywise error rate (k-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Candès. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testin...
متن کاملRobust inference with knockoffs
We consider the variable selection problem, which seeks to identify important variables influencing a response Y out of many candidate features X1, . . . , Xp. We wish to do so while offering finite-sample guarantees about the fraction of false positives—selected variables Xj that in fact have no effect on Y after the other features are known. When the number of features p is large (perhaps eve...
متن کاملState-based Reconstructability Analysis
Reconstructability analysis (RA) is a method for detecting and analyzing the structure of multivariate categorical data. While Jones and his colleagues extended the original variable-based formulation of RA to encompass models defined in terms of system states, their focus was the analysis and approximation of real-valued functions. In this paper, we separate two ideas that Jones had merged tog...
متن کاملReconstructability analysis of epistasis.
The literature on epistasis describes various methods to detect epistatic interactions and to classify different types of epistasis. Reconstructability analysis (RA) has recently been used to detect epistasis in genomic data. This paper shows that RA offers a classification of types of epistasis at three levels of resolution (variable-based models without loops, variable-based models with loops...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Annals of Statistics
سال: 2022
ISSN: ['0090-5364', '2168-8966']
DOI: https://doi.org/10.1214/21-aos2104